MAGICDATA Mandarin Chinese Read Speech Corpus was developed by MAGIC DATA
Technology Co., Ltd. and freely published for non-commercial use.
<p>
The contents and the corresponding descriptions of the corpus include:
<p>
<ul>
  <li> The corpus contains <strong> 755 hours </strong> of speech data, which is
  mostly mobile recorded data.</li>
  <li> <strong> 1080 speakers </strong> from different accent areas in China are
  invited to participate in the recording.</li>
  <li> The sentence transcription accuracy is higher than 98%.</li>
  <li> Recordings are conducted in a quiet indoor environment.</li>
  <li> The database is divided into training set, validation set, and testing
  set in a ratio of 51: 1: 2.</li>
  <li> Detail information such as speech data coding and speaker information is
  preserved in the metadata file.</li>
  <li> The domain of recording texts is diversified, including interactive
  Q&amp;A, music search, SNS messages, home command and control, etc.</li>
  <li> Segmented transcripts are also provided.</li>
</ul>

The corpus aims to support researchers in speech recognition, machine
translation, speaker recognition, and other speech-related fields. Therefore,
the corpus is totally free for academic use.
<p>

The corpus is a subset of a much bigger data ( <strong> 10566.9 hours Chinese
Mandarin Speech Corpus </strong>) set which was recorded in the same
environment. Please feel free to contact us
via <a href="mailto:business@magicdatatech.com">business@magicdatatech.com</a> for more
details.
<p>

<strong>Citation</strong>
<p>
Please cite the corpus as "Magic Data Technology Co., Ltd.,
"http://www.imagicdatatech.com/index.php/home/dataopensource/data_info/id/101",
05/2019".
<p>

<strong>About us</strong>
<p>
Magic Data Technology Co., Ltd. (referred to as Magic Data) was established in
2016. Through our higher-expertise and higher-precision data services, Magic
Data has quickly grown into one of the foremost companies in artificial
intelligence industry. We strive to provide the most efficient and highest
quality one-stop data services for customers in the fields of speech
recognition, intelligent imaging and Natural Language Understanding (NLU). Our
services include data scheme design, data collection, data
annotation/transcription, etc.
<p>

<strong>Contact</strong>
<p>
<ul>
  <li> Tel:  (+86) 10-82527250</li>
  <li> Email:  business@magicdatatech.com</li>
  <li> http://www.imagicdatatech.com</li>
</ul>
<p>
